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Abstract — Sphere decoding (SD) is a low complexity maximum 
likelihood (ML) detection algorithm, which has been adapted for 
different linear channels in digital communications. The complex- 
ity of the SD has been shown to be exponential in some cases, and 
polynomial in others and under certain assumptions. The sphere 
radius and the number of nodes visited throughout the tree 
traversal search are the decisive factors for the complexity of the 
algorithm. The radius problem has been addressed and treated 
widely in the literature. In this paper, we propose a new structure 
for SD, which drastically reduces the overall complexity. The 
complexity is measured in terms of the floating point operations 
per second (FLOPS) and the number of nodes visited throughout 
the algorithm's tree search. This reduction in the complexity is 
due to the ability of decoding the real and imaginary parts of each 
jointly detected symbol independently of each other, making use 
of the new lattice representation. We further show by simulations 
that the new approach achieves 80% reduction in the overall 
complexity compared to the conventional SD for a 2x2 system, 
and almost 50% reduction for the 4x4 and 6x6 cases, thus relaxing 
the requirements for hardware implementation. 

I. Introduction 

Minimizing the bit error rate (BER) and thus improving 
the performance is the main challenge of receiver design for 
multiple- input multiple-output (MIMO) systems. However, 
the performance improvements usually come at the cost of 
increased complexity in the receiver design. Assuming that 
the receiver has perfect knowledge of the channel H, differ- 
ent algorithms have been implemented to separate the data 
streams corresponding to transmit antennas [1]. Among 
these algorithms. Maximum Likelihood detection (ML) is the 
optimum one. However, in MIMO systems, the ML problem 
becomes exponential in the number of possible constellation 
points making the algorithm unsuitable for practical purposes 
[2]. Sphere decoding, on the other hand, or the Fincke-Pohst 
algorithm [3], reduces the computational complexity for the 
class of computationally hard combinatorial problems that 
arise in ML detection problems [4] -[5]. 

Complexity reduction techniques for SD have been pro- 
posed in the literature. Among these techniques, the increased 
radius search (IRS) [6] and the improved increasing radius 
search (IIRS) [7] suggested improving SD complexity effi- 
ciency by making a good choice of the sphere radius, trying 
to reduce the number of candidates in the search space. The 
former suggested a set of sphere radii ci < C2 < ... < c„ 
such that SD starts with ci trying to find a candidate. If no 



candidates were found, SD executes again using the increased 
radius C2. The algorithm continues until either a candidate 
is found or the radius is increased to c„ which should be 
large enough to guarantee obtaining at least one candidate. 
Whereas, the latter provided a mechanism to avoid the waste of 
computations taking place in the former method when a certain 
radius c„, does not lead to a candidate solution. Obviously, 
these two techniques studied the complexity problem from the 
radius choice perspective. 

In this paper we improve the SD complexity efficiency by 
reducing the number of FLOPS required by the SD algorithm 
keeping in mind the importance of choosing a radius. The 
radius should not be too small to result in an empty sphere 
and thus restarting the search, and at the same time, it should 
not be too large to increase the number of lattice points to 
be searched. We use the formula presented in [8] for the 
radius, which is = 2a'^N, where is the problem dimension 
and is the noise variance. The reduction of the number 
of FLOPS is accomplished by introducing a new and proper 
lattice representation, as well as incorporating quantization at 
a certain level of the SD search. It is also important to mention 
that searching the lattice points using this new formulation can 
be performed in parallel, since the new proposed structure in 
this paper enables decoding the real and imaginary parts of 
each symbol independently and at the same time. 

The remainder of this paper is organized as follows: In 
Section [III a problem definition is introduced and a brief 
review of the conventional SD algorithm is presented. In 
Section Uni we propose the new lattice representation and per- 
form the mathematical derivations for complexity reduction. 
Performance and complexity comparisons for different number 
of antennas or modulation schemes are included in Section HVl 
Finally, we conclude the paper in Section |Vl 

II. Problem Definition and the Conventional 
Sphere Decoder 

Consider a MIMO system with transmit and M receive 
antennas. The received signal at each instant of time is given 
by 

y^Hs + v (1) 

where y E C^, H G C*^'^ is the channel matrix, s G is 
an dimensional transmitted complex vector whose entries 



have real and imaginary parts that are integers, v e C*^ 
is the i.i.d complex additive white Gaussian noise (AWGN) 
vector with zero-mean and covariance matrix a^I. Usually, 
the elements of the vector s are constrained to a finite set 
n where n C Z^^, e.g., n = {-3, -1, 1,3}^^ for 16-QAM 
(quadrature amplitude modulation) where Z and C denote the 
sets of integers and complex numbers respectively. 

Assuming H is known at the receiver, the ML detection is 
given by 

s = arg min I ly — iJs| 1^. (2) 

Solving dU becomes impractical and exhaustive for high 
transmission rates, and the complexity grows exponentially. 
Therefore, instead of searching the whole space defined by all 
combinations drawn by the set ft, SD solves this problem by 
searching only over those lattice points or combinations that 
lie inside a sphere centered around the received vector y and 
of radius d. Introducing this constraint on (|2|i will change the 
problem to 



argmm \ \y 

sen 



(3) 



A frequently used solution for the QAM-modulated complex 
signal model given in ^ is to decompose the A^-dimensional 
problem into a 2A^-dimensional real-valued problem, which 
then can be written as 
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where SRjy} and Sjy} denote the real and imaginary parts 
of y, respectively [1], [4], [8] -[9]. Assuming N — M in the 
sequel, and introducing the QR decomposition of H, where R 
is an upper triangular matrix, and the matrix Q is unitary, ^ 
can be written as 



argmm \ \y 

sen 



Rsir < 



(5) 



where y = Q^y. Let R=Vi,j]2Nyi2N ^^^^ ^ upper 
triangular Now to solve Q, the SD algorithm constructs a 
tree, where the branches coming out of each node correspond 
to the elements drawn by the set ft. It then executes the 
decoding process starting with the last layer (/ — 2N) which 
matches the first level in the tree, calculating the partial 
metric ||3{y7v} — r2N.2N'^{sN}\\'^, and working its way up 
in a similar way to the successive interference cancelation 
technique, until decoding the first layer by calculating the 
corresponding partial metric ||3fi{yi} — r-i^i3ff{si} + ... + 

5R{yAr}-ri,ivK{sAr} + 3{yi}-ri,Ar+i3{si} + ... + 3{yjv}- 

''i.2Jv3{sAr}| p. The summation of all partial metrics along the 
path of a node starting from the root constitutes the weight of 
that node. If that weight exceeds the square of the sphere 
radius d^, the algorithm prunes the corresponding branch, 
declaring it as an improbable way to a candidate solution. 
In other words, all nodes that lead to a solution that is outside 
the sphere are pruned at some level of the tree. Whenever 
a valid lattice point at the bottom level of the tree is found 
within the sphere, the square of the sphere radius d^ is set to 



the newly found point weight, thus reducing the search space 
for finding other candidate solutions. Finally, the leaf with the 
lowest weight will be the survivor one, and the path along the 
tree from the root to that leaf represents the estimated solution 
s. 

To this end, it is important to emphasize the fact that the 
complexity of this algorithm, although it is much lower than 
the ML detection, is still exponential at low SNR, and is 
directly related to the choice of the radius d, as well as the 
number of floating point operations taking place at every tree 
node inside the sphere. 

III. New Lattice Representation 

The lattice representation given in ^ imposes a major 
restriction on the tree search algorithm. Specifically, the search 
has to be executed serially from one level to another on the 
tree. This can be made clearer by writing the partial metric 
weight formula as 



\Vi 



2JV 

k=l 



(6) 



with I = 2N,2N - W2N+i{x^^^+^'>) = and 

where {xi, X2, xn}, {xn+i, xn+2, X2n} are the real 
and imaginary parts of {si, S2, ■■■,sn} respectively. 

Obviously, the SD algorithm starts from the upper level 
in the tree (/ — 2N), traversing down one level at a time, 
and computing the weight for one or more nodes (depending 
on the search strategy adopted, i.e., depth-first, breadth-first, 
or other reported techniques in the literature) until finding a 
candidate solution at the bottom level of the tree (I — 1). 
According to this representation, it is impossible, for instance, 
to calculate X)fe=i ^i.kXk for a node in level (/ — 2N — 1) 
without assigning an estimate for X2n- This approach results 
in two related drawbacks. First, the decoding of any xi requires 
an estimate value for all preceding Xj for j = ^ + 1, 2A^. 
Secondly, there is no room for parallel computations since the 
structure of the tree search is sequential. 

The main contribution in this paper is that we relax the tree 
search structure making it more flexible for parallelism, and at 
the same time reducing the number of computations required 
at each node by making the decoding of every two adjacent 
levels in the tree totally independent of each other. 

We start by reshaping the channel matrix representation 
given in (HJi in the following form: 



-3(i?i.jv) 
5R(i?i,jv) 



'^(Hn,!) 5R(i/Ar.l) ■•• S(i/Ar,Ar) 5R(i?w,iv) 

(7) 

where H,n,n is the i.i.d. complex path gain from transmit 
antenna n to receive antenna m. By looking attentively at the 
columns of H starting from the left hand side, and defining 
each pair of columns as one set, we observe that the columns 



in each set are orthogonal, a property that has a substantial 
effect on the structure of the problem. Using this channel 
representation changes the order of detection of the received 
symbols to the following form 

••• ^im) ^im)]^. (8) 

This means that the first and second levels of the search tree 
correspond to the real and imaginary parts of sn, unlike 
the conventional SD, where those levels correspond to the 
imaginary part of sat and sjv-i respectively. The new structure 
becomes advantageous after applying the QR decomposition to 
H. By doing so, and due to that special form of orthogonality 
among the columns of each set, all the elements rk.k+i for 
fc = 1, 3, 2N — 1 in the upper triangular matrix R become 
zero. The locations of these zeros are very important since 
they introduce orthogonality between the real and imaginary 
parts of every detected symbol. 

In the following, we will prove that the QR decomposition 
of H introduces the aforementioned zeros. There are several 
methods for computing the QR decomposition, we will do so 
by means of the Gram-Schmidt algorithm. 
Proof: Let 



H=[h, 



(9) 



where is the fcth column of H. Recalling the Gram-Schmidt 
algorithm, we define 

U; = hi 

and then, 

= - E^ti' _for fc = 2, 3, ...,2N. 

where is the projection of vector h^: onto u, defined by 

<^u, h* = T^^u,- (10) 

and — ITT- for k — 1,2, ... , 2N. Rearranging the equations 

^ ll^^^ll 

h; = e;||u;|| 

h2 = (j)u, h2 + e2||u2|| 

ha = 0u, iij + hi + ej||uj|| 



E 



fc-i 



- ek\\Uk\ 



Now, writing these equations in the matrix form, we get: 
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(e7,h2) 

Iiu2|| 





(e7,hj) 
(e2,hi) 
lluill 



(11) 



Obviously, the matrix to the left is the orthogonal unitary Q 
matrix, and the one to the right is the upper triangular R matrix. 
Now our task is to show that the terms {e^, hk+i) are zero for 
fc = 1, 3, . . . , 2N — 1. Three observations conclude the proof. 
First, since and hk+j are orthogonal for fc = 1, 3, . . . , 2N — 
1, then ^k+i = 0ui+; = for the same fc. 
Second, the projection of Um for m = 1, 3, . . . , fc — 2 on the 
columns and h^+y respectively is equal to the projection of 



Um+; on the columns h^+z and -h^ respectively. To formalize 
this: 



(Um,h|t) — {u,„+] ,hi:+]) 

{u„,,hk+]) = -{ui„+i,hk) 



(12) 



for fc = 1, 3, . . . , 2N—1 and m = 1, 3, . . . , fc— 2. This property 
becomes obvious by using the first observation and revisiting 
the special structure of 

Third, making use of the first two observations, and noting 
that I lii^l 1=1 |hi:+; II for fc = 1, 3, . . . , 2A^ - 1, it can be easily 
shown that | |u/t| |=| |ua-+; 1 1 for the same fc. 
Then, 
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Now, applying the above observations to il3[ . we get 
{ek, i^k+i) 



^ .(0 - ^k,ui){ui,hk+i) 



llUfell ||U;|r 

-{ui, hk+l) {ilk, ui) _ 
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This concludes the proof. ■ 

In this context, the SD algorithm executes in the following 
way. First, the partial metric weight \y2N — r2N,2NX2N\^ for 
the /i nodes in the first level of the tree is computed, where /i 
is the number of elements in Q.. This metric is then checked 
against the specified sphere radius <P. If the weight at any 
node is greater than the sphere radius then the corresponding 
branch is pruned. Otherwise, the metric value is saved for the 
next step. At the same time, another set of fi partial metric 
computations of the form \y2N-1 ~ 'I'2n-i,2N-iX2N-i\^ 
take place at the second level, since these two levels are 
independent as proved above. These metrics are checked 
against <P in a similar way to that done in the above level. 
The weights of the survivor nodes from both levels are 
summed up and the summation is checked against the sphere 
constraint, ending up with a set of survivor s^q symbols. 
Secondly, the estimation of the remaining X2N-2 or sn-i 
symbols is done by quantization to the nearest constellation 
element in fl. In other words, the values of X2N-2, .. .,xi 



are calculated recursively for each combination of survived 
X2N tX2N-i, and the total weight given by \\y — i?s|p is 
determined at the bottom level of the tree for those leaves 
that obey the radius constraint. Finally, the leaf with the 
minimum weight is chosen to be the decoded message (s). 
This can be formalized as 
Stepl: 

for « = 1 to // 

X2N = ^^(m) 

X2N-1 = 

if \y2N — f'2N.2NS:2N\'^ < (P ^ add to survivor set 1 
else prune branch 

if \y2N-1 - r2N-i,2N-iS:2N-i\'^ < (f ^ add to survivor 
set 2 

else prune branch 
next i 

save all combinations of X2NtX2n-i whose weight 
summations comply to the radius constraint. Denote the 
number of survivors at the end of this step by {A}. 
Step2: (for every combination in A, calculate X2N-2, ■■■tXi 
recursively as shown below) 
for ? = 2A^ - 2 to 1, step -2 
set v=l/2, and calculate 



2N 

E 



rikXk, 



rii 



ei-i 



Xt-l 



2N 

E 



n-i.kXk 



wi{x^'^) = + - e, - xm,i? 

next I 

where quantizes the value (.) to the closest element in the 
set Vl. The output of the above two steps is a set of candidate 
solutions X2Ni ---.xi with corresponding weights. 
Step3: 

choose that set of X2Ni .., which has the lowest weight to 
be the detected message. 

Finally, the above algorithm's complexity is linear with 
the number of antennas, and the performance is optimal for 
MIMO systems having two antennas at both ends. However, 
this performance becomes suboptimal for systems with > 3 
(e.g., there is a 4 dB loss compared to the conventional SD at a 
BER of 10^^ for a 4x4 system ). This is mainly due to the use 
of quantization which takes place at all tree levels except the 
first two, and makes the estimation of xs loose as we further 
traverse down in the tree. Thus, we introduce minor heuristic 
rules in the middle levels of the tree when > 3, while still 
using the above steps at the very first and very last two levels 
in the tree, in order to obtain near optimal performance (less 
than 1 dB loss), sticking with a complexity that is very much 
small compared to the conventional SD. A brief discussion on 



how to specify these rules are proposed in Section IIVI 

IV. Simulation Results 

We have considered 2x2, 4x4, and 6x6 cases using 16- 
QAM and 64-QAM modulation schemes. As mentioned in 
the previous section, we introduce heuristic rules in the middle 
levels of the tree when > 3. Therefore, in our simulations 
for the 4x4 and 6x6 cases, we executed the algorithm in the 
following way: 

For the 4x4 system, the first two levels of the tree which 
correspond to the imaginary and real parts of the symbol 
S4 are treated the same way as explained in Step 1 of the 
algorithm. For each survivor 54, the weight for all different 
fx^ possibilities of S3 is calculated, and those weights that 
violate the radius constraint are dismissed {<P — 2a'^N [8]). 
The best 8 survivors, or in other words, those 8 ss's that have 
lowest weights are kept for next steps while for the others 
the corresponding paths are pruned. In the third two levels, 
the same procedure performed in the previous step is applied 
and the best 8 S2's are kept. Finally, a quantization process 
followed by an estimation of the transmitted message is carried 
out exactly the same way as in step 2 and 3 explained in the 
previous section. On the other hand, the 6x6 case has similar 
approach but with different parameters. The first two levels are 
treated similarly as explained in step 1. For the 16-QAM (64- 
QAM) case, the best 16, 8, and 4 (32,32, and 16) survivors 
of sg, S5, and S4 respectively are kept in the middle levels 
until reaching the last four levels which are then processed by 
quantization, in order to obtain §2 and si. 

Figure [U reports the performance of the proposed algorithm 
versus the conventional SD, for 2x2, 4x4, and 6x6 cases using 
16-QAM modulation. We observe that the proposed algorithm 
achieves exactly the same performance as the conventional 
SD, but with much smaller complexity as shown in Figure 
|2] However, there is almost 0.5 — 1 dB performance loss in 
the proposed 4x4 and 6x6 compared to the conventional. This 
loss is due to the k-hest criteria adoption at a certain level 
of the tree as well as applying the quantization process at the 
low levels of the tree as mentioned above. From Figure |2l it 
is clear that the proposed algorithm reduces the complexity 
by 80% for the 2x2 case, and 50% for both the 4x4 and 
6x6 systems. Figures [3] and 2] show the performance and 
complexity curves for the 2x2, 4x4, and 6x6 cases, for the 
64-QAM modulation. Again, the performance is shown to be 
close to the conventional for the 2x2 case, and has almost 
0.5 — 1 dB degradation loss for the 4x4 and 6x6 cases. The 
difference in the complexity for the proposed and conventional 
SD are within the same range as in the 16-QAM modulation 
case. 

V. Conclusions 

A simple and general lattice representation in the context of 
sphere decoding was proposed in this paper. The performance 
of the proposed structure was shown to be optimal for 2x2 
systems while close to optimal (0.5 — 1 dB loss) in the 4x4 
and 6x6 cases. A complexity reduction of 80% was attainable 
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Fig. 1. BER vs SNR for the proposed and conventional SD over a 2x2, 4x4, 
and 6x6 MIMO flat fading channel using 16-QAM modulation. 
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Fig. 3. BER vs SNR for the proposed and conventional SD over a 2x2, 4x4, 
and 6x6 MIMO flat fading channel using 64-QAM modulation. 
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Fig. 2. Total number of floating point operations vs SNR for the proposed 
and conventional SD over a 2x2, 4x4 and 6x6 MIMO flat fading channel 
using 16-QAM modulation. 



Fig. 4. Total number of floating point operations vs SNR for the proposed 
and conventional SD over a 2x2, 4x4, and 6x6 MIMO flat fading channel 
using 64-QAM modulation. 



for the 2x2 case, and 50% for the 4x4 and 6x6 cases, compared 
to their correspondence for the conventional SD. 
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